#This exercise accompanies the lessons in Environmental Data Analytics on Data Exploration.
<FirstLast>_A03_DataExploration.Rmd (replacing
<FirstLast> with your first and last name).The completed exercise is due on Sept 30th.
#setwd("/home/guest/R/EDA-Fall2022)
#change wd from assingments to main eda folder
#install.packages("tinytex")
#tinytex::install_tinytex()
#Allow for code and comments to not run off page when knitting file. Note- add to top of every rmd
#install.packages('formatR')
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=80), tidy=TRUE)
#installing necessary packages, one installed add # to knit assignment
#install.packages("tidyverse")
#install.packages("dplyr")
library(ggplot2)
#install.packages("lubridate")
library(lubridate)
#adding datasets needed for assingment
Neonics.data <-read.csv("Data/Raw/ECOTOX_Neonicotinoids_Insects_raw.csv", stringsAsFactors = TRUE)
Litter.data <-read.csv("Data/Raw/NEON_NIWO_Litter_massdata_2018-08_raw.csv", stringsAsFactors = TRUE)
#str(Neonics.data$Effect)
#Explore dataset
#View(Neonics.data)
#class(Neonics.data) #data.frame
Answer: This database provides information on adverse effects of single chemical stressors aquadic and terrestiral species. Inspect are important for polluation and food production and understanding stressors are important.
Answer: Understanding forest litter and woody debris helps us better understand the health of our mountains. This monitoirng allows us to understand diverse ecosystems in multiplal spatial and temporral scales. It also helps better understand the biodiversity in our ecosystem. Litter and woody debris sampling help provides understanding about plans.
Answer: 1.Litter and fine woody debris are collected from elevated ground traps 2. Sampling desing ecompases spaital paramaters such as NEON sites with woody vegitation >2M. 3. Ground traps are sampled once per year and more frequenly depending on density of vegitation and forest.
colnames(Neonics.data)
## [1] "CAS.Number" "Chemical.Name"
## [3] "Chemical.Grade" "Chemical.Analysis.Method"
## [5] "Chemical.Purity" "Species.Scientific.Name"
## [7] "Species.Common.Name" "Species.Group"
## [9] "Organism.Lifestage" "Organism.Age"
## [11] "Organism.Age.Units" "Exposure.Type"
## [13] "Media.Type" "Test.Location"
## [15] "Number.of.Doses" "Conc.1.Type..Author."
## [17] "Conc.1..Author." "Conc.1.Units..Author."
## [19] "Effect" "Effect.Measurement"
## [21] "Endpoint" "Response.Site"
## [23] "Observed.Duration..Days." "Observed.Duration.Units..Days."
## [25] "Author" "Reference.Number"
## [27] "Title" "Source"
## [29] "Publication.Year" "Summary.of.Additional.Parameters"
na.omit(Neonics.data)
na.omit(Litter.data)
str(Neonics.data)
## 'data.frame': 4623 obs. of 30 variables:
## $ CAS.Number : int 58842209 58842209 58842209 58842209 58842209 58842209 58842209 58842209 58842209 58842209 ...
## $ Chemical.Name : Factor w/ 9 levels "(1E)-N-[(6-Chloro-3-pyridinyl)methyl]-N-ethyl-N'-methyl-2-nitro-1,1-ethenediamine",..: 9 9 9 9 9 9 9 9 9 9 ...
## $ Chemical.Grade : Factor w/ 9 levels "Analytical grade",..: 9 9 9 9 9 9 9 9 9 9 ...
## $ Chemical.Analysis.Method : Factor w/ 5 levels "Measured","Not coded",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ Chemical.Purity : Factor w/ 80 levels ">=98",">=99.0",..: 69 69 50 50 50 50 50 50 50 50 ...
## $ Species.Scientific.Name : Factor w/ 398 levels "Acalolepta vastator",..: 69 69 248 248 248 248 248 248 248 248 ...
## $ Species.Common.Name : Factor w/ 303 levels "Alfalfa Leafcutter Bee",..: 74 74 142 142 142 142 142 142 142 142 ...
## $ Species.Group : Factor w/ 4 levels "Insects/Spiders",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Organism.Lifestage : Factor w/ 20 levels "Adult","Cocoon",..: 1 1 19 19 19 1 19 1 1 19 ...
## $ Organism.Age : Factor w/ 39 levels "<=24","<=48",..: 39 39 39 39 39 36 39 36 36 39 ...
## $ Organism.Age.Units : Factor w/ 11 levels "Day(s)","Days post-emergence",..: 9 9 4 4 4 1 4 1 1 4 ...
## $ Exposure.Type : Factor w/ 24 levels "Choice","Dermal",..: 23 23 11 11 11 11 11 11 11 11 ...
## $ Media.Type : Factor w/ 10 levels "Agar","Artificial soil",..: 7 7 3 3 3 3 3 3 3 3 ...
## $ Test.Location : Factor w/ 4 levels "Field artificial",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ Number.of.Doses : Factor w/ 30 levels "' 4-5","' 4-7",..: 30 30 18 18 18 18 18 18 18 18 ...
## $ Conc.1.Type..Author. : Factor w/ 3 levels "Active ingredient",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Conc.1..Author. : Factor w/ 1006 levels "<0.0004","<0.025",..: 639 510 813 622 442 637 500 642 814 784 ...
## $ Conc.1.Units..Author. : Factor w/ 148 levels "%","% v/v","% w/v",..: 132 132 91 91 91 91 91 91 91 91 ...
## $ Effect : Factor w/ 19 levels "Accumulation",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ Effect.Measurement : Factor w/ 155 levels "Abundance","Accuracy of learned task, performance",..: 87 87 87 87 87 87 87 87 87 87 ...
## $ Endpoint : Factor w/ 28 levels "EC10","EC50",..: 15 15 8 8 8 8 8 8 8 8 ...
## $ Response.Site : Factor w/ 19 levels "Abdomen","Brain",..: 14 14 14 14 14 14 14 14 14 14 ...
## $ Observed.Duration..Days. : Factor w/ 361 levels "<.0002","<.0021",..: 145 145 145 145 145 145 145 145 145 145 ...
## $ Observed.Duration.Units..Days. : Factor w/ 17 levels "Day(s)","Day(s) post-emergence",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Author : Factor w/ 433 levels "Abbott,V.A., J.L. Nadeau, H.A. Higo, and M.L. Winston",..: 66 66 181 181 181 181 181 181 181 181 ...
## $ Reference.Number : int 107388 107388 103312 103312 103312 103312 103312 103312 103312 103312 ...
## $ Title : Factor w/ 458 levels "A Common Pesticide Decreases Foraging Success and Survival in Honey Bees",..: 91 91 450 450 450 450 450 450 450 450 ...
## $ Source : Factor w/ 456 levels "Acta Hortic.1094:451-456",..: 295 295 296 296 296 296 296 296 296 296 ...
## $ Publication.Year : int 1982 1982 1986 1986 1986 1986 1986 1986 1986 1986 ...
## $ Summary.of.Additional.Parameters: Factor w/ 943 levels "Purity: \xca NC - NC | Organism Age: \xca NC - NC Not coded | Conc 1 (Author): \xca Not coded NR - NR AI lb/acr"| __truncated__,..: 572 547 122 120 124 228 119 230 233 121 ...
dim(Neonics.data)
## [1] 4623 30
length(Neonics.data)
## [1] 30
# saving new datasets write.csv(Neonics.data, file =
# './Data/Ariel_processed/Neonics.data.v2', row.names=FALSE)
# write.csv(Litter.data, file = './Data/Ariel_processed/Litter.data.v2',
# row.names=FALSE)
summary function on the “Effect” column,
determine the most common effects that are studied. Why might these
effects specifically be of interest?summary(Neonics.data$Effect)
## Accumulation Avoidance Behavior Biochemistry
## 12 102 360 11
## Cell(s) Development Enzyme(s) Feeding behavior
## 9 136 62 255
## Genetics Growth Histology Hormone(s)
## 82 38 5 1
## Immunological Intoxication Morphology Mortality
## 16 12 22 1493
## Physiology Population Reproduction
## 7 1803 197
summary(Neonics.data$Effect.Measurement)
## Abundance
## 1699
## Mortality
## 1294
## Survival
## 133
## Progeny counts/numbers
## 120
## Food consumption
## 103
## Emergence
## 98
## Search/explore/forage behavior
## 96
## Feeding behavior, general
## 92
## Chemical avoidance
## 65
## Weight
## 48
## Distance moved, change in direct movement
## 38
## Feeding behavior
## 36
## Flying behavior
## 30
## Accuracy of learned task, performance
## 28
## Sex ratio
## 27
## Fecundity
## 26
## Stimulus avoidance
## 26
## Righting response
## 24
## Lifespan
## 23
## Acquired task
## 22
## Hatch
## 21
## Predatory behavior
## 21
## Acetylcholinesterase
## 20
## Walk
## 19
## Freezing behavior
## 18
## Reproductive success (general)
## 17
## Slowed, Retarded, Delayed or Non-development
## 17
## Grooming
## 16
## Diameter
## 14
## Residue
## 12
## Activity, general
## 11
## Food avoidance
## 11
## Control
## 9
## Developmental changes, general
## 9
## Intrinsic rate of increase
## 9
## Pollen collected
## 9
## Size
## 9
## Esterase
## 8
## Intoxication, general
## 8
## Mortality/survival, general
## 8
## Population change (change in N/change in time)
## 8
## Smell/Sniff
## 8
## Biomass
## 7
## Catalase mRNA
## 7
## Generation time
## 7
## Infected
## 7
## Orientation
## 7
## Population doubling time
## 7
## Population growth rate
## 7
## Sealed brood
## 7
## Vitellogenin mRNA
## 7
## Ali esterase
## 6
## Apoptosis, programmed cell death, DNA fragmentation
## 6
## Carboxylesterase
## 6
## Hemocyte
## 6
## Knockdown
## 6
## Viability
## 6
## Extinction
## 5
## Net Reproductive Rate
## 5
## Polyphenol oxidase
## 5
## Prey penetration
## 5
## Pupation
## 5
## Reproducing organisms
## 5
## Amount or percent animals infested with parasites
## 4
## Continual reinforcement task performed
## 4
## Defensin 1 mRNA
## 4
## Diversity, Evenness
## 4
## Encapsulation or Melanization Response
## 4
## General biochemical effect
## 4
## Glutathione S-transferase
## 4
## Histological changes, general
## 4
## Life expectancy
## 4
## Thioredoxin peroxidase mRNA
## 4
## Vanin-like protein 1-like mRNA
## 4
## Bees wax produced
## 3
## Behavioral changes, general
## 3
## Catalase
## 3
## Cell turnover
## 3
## Cytochrome P-450
## 3
## Feeding time
## 3
## Length
## 3
## Protein, total
## 3
## Respiration
## 3
## Response time to a stimulus
## 3
## Stage
## 3
## Time to first progeny
## 3
## Trehalase mRNA
## 3
## Alkaline phosphatase
## 2
## Carboxylesterase clade I, member 1 mRNA
## 2
## Centractin mRNA
## 2
## Chitinase 5 mRNA
## 2
## Colony maintenance (bees)
## 2
## COX2 mRNA
## 2
## Endoplasmin-like mRNA
## 2
## Gamete production
## 2
## Glucose dehydrogenase 2 mRNA
## 2
## Glucosinolate sulphatase mRNA
## 2
## Glutathione peroxidase-like 1 mRNA
## 2
## Glutathione peroxidase-like 2 mRNA
## 2
## (Other)
## 77
Answer: The main effects in this dataset are mortality, feeding behavior, population, behavior and mortality. These are all important aspects to study for species survial and better understanding population estimates.
summary function, determine the six most
commonly studied species in the dataset (common name). What do these
species have in common, and why might they be of interest over other
insects? Feel free to do a brief internet search for more information if
needed.summary(Neonics.data$Species.Common.Name)
## Honey Bee Parasitic Wasp
## 667 285
## Buff Tailed Bumblebee Carniolan Honey Bee
## 183 152
## Bumble Bee Italian Honeybee
## 140 113
## Japanese Beetle Asian Lady Beetle
## 94 76
## Euonymus Scale Wireworm
## 75 69
## European Dark Bee Minute Pirate Bug
## 66 62
## Asian Citrus Psyllid Parastic Wasp
## 60 58
## Colorado Potato Beetle Parasitoid Wasp
## 57 51
## Erythrina Gall Wasp Beetle Order
## 49 47
## Snout Beetle Family, Weevil Sevenspotted Lady Beetle
## 47 46
## True Bug Order Buff-tailed Bumblebee
## 45 39
## Aphid Family Cabbage Looper
## 38 38
## Sweetpotato Whitefly Braconid Wasp
## 37 33
## Cotton Aphid Predatory Mite
## 33 33
## Ladybird Beetle Family Parasitoid
## 30 30
## Scarab Beetle Spring Tiphia
## 29 29
## Thrip Order Ground Beetle Family
## 29 27
## Rove Beetle Family Tobacco Aphid
## 27 27
## Chalcid Wasp Convergent Lady Beetle
## 25 25
## Stingless Bee Spider/Mite Class
## 25 24
## Tobacco Flea Beetle Citrus Leafminer
## 24 23
## Ladybird Beetle Mason Bee
## 23 22
## Mosquito Argentine Ant
## 22 21
## Beetle Flatheaded Appletree Borer
## 21 20
## Horned Oak Gall Wasp Leaf Beetle Family
## 20 20
## Potato Leafhopper Tooth-necked Fungus Beetle
## 20 20
## Codling Moth Black-spotted Lady Beetle
## 19 18
## Calico Scale Fairyfly Parasitoid
## 18 18
## Lady Beetle Minute Parasitic Wasps
## 18 18
## Mirid Bug Mulberry Pyralid
## 18 18
## Silkworm Vedalia Beetle
## 18 18
## Araneoid Spider Order Bee Order
## 17 17
## Egg Parasitoid Insect Class
## 17 17
## Moth And Butterfly Order Oystershell Scale Parasitoid
## 17 17
## Hemlock Woolly Adelgid Lady Beetle Hemlock Wooly Adelgid
## 16 16
## Mite Onion Thrip
## 16 16
## Western Flower Thrips Corn Earworm
## 15 14
## Green Peach Aphid House Fly
## 14 14
## Ox Beetle Red Scale Parasite
## 14 14
## Spined Soldier Bug Armoured Scale Family
## 14 13
## Diamondback Moth Eulophid Wasp
## 13 13
## Monarch Butterfly Predatory Bug
## 13 13
## Yellow Fever Mosquito Braconid Parasitoid
## 13 12
## Common Thrip Eastern Subterranean Termite
## 12 12
## Jassid Mite Order
## 12 12
## Pea Aphid Pond Wolf Spider
## 12 12
## Spotless Ladybird Beetle Glasshouse Potato Wasp
## 11 10
## Lacewing Southern House Mosquito
## 10 10
## Two Spotted Lady Beetle Ant Family
## 10 9
## Apple Maggot (Other)
## 9 670
Answer: The most common are Honey Bee, Parasitic Wasp, Buff Tailed Bumblebee, Carniolan Honey Bee, Bumble Bee, and Italian Honeybee. Most of the top insects sudied are bees which are pollnators and a keystone species for our food production.
class(Neonics.data$Conc.1..Author.)
## [1] "factor"
Answer:The class(Neonics.data$Conc.1..Author.) shows up as “factor” because it is not a numerical data set with continuous or descrete values. This column is a categorical dataset.
geom_freqpoly, generate a plot of the number of
studies conducted by publication year.ggplot(Neonics.data) + geom_freqpoly(aes(x = Publication.Year), bins = 15)
ggplot(Neonics.data) + geom_freqpoly(aes(x = Publication.Year), bins = 25)
ggplot(Neonics.data) + geom_freqpoly(aes(x = Publication.Year), bins = 15, color = "red")
ggplot(Neonics.data) + geom_freqpoly(aes(x = Publication.Year, color = Test.Location),
bins = 15)
Interpret this graph. What are the most common test locations, and do they differ over time?
Answer: The most common testing locations are the labs and natural field. The lab testing only supparsed natral field around 2009 and beacame the most common test location.
ggplot(Neonics.data, aes(x = Endpoint)) + geom_bar()
# label_parsed(Endpoint, multi_line = TRUE) did not work try to space text on X
# axis
Answer:The two most common end points are LOEC and NEOL.
unique function, determine which dates litter was sampled
in August 2018.# help('as.Date') Litter.data
# <-read.csv('Data/Raw/NEON_NIWO_Litter_massdata_2018-08_raw.csv',
# stringsAsFactors = TRUE)
class(Litter.data$collectDate) # checking current class
## [1] "factor"
Litter.data$collectDate <- c(ymd(Litter.data$collectDate)) #Changing to date format
class(Litter.data$collectDate) #checking the changed class
## [1] "Date"
unique(Litter.data$collectDate)
## [1] "2018-08-02" "2018-08-30"
# Litter.data.datetime <- as.Date.factor(Litter.data$collectDate, format=
# '%Y/%m/%d') - Do not use na.omit(Litter.data.datetime)
# format(Litter.data$collectDate, format= '%Y/%m/%d') do not use
# class(Litter.data.datetime$collectDate)
unique function, determine how many plots
were sampled at Niwot Ridge. How is the information obtained from
unique different from that obtained from
summary?# help('unique') help(summary)
unique(Litter.data$plotID)
## [1] NIWO_061 NIWO_064 NIWO_067 NIWO_040 NIWO_041 NIWO_063 NIWO_047 NIWO_051
## [9] NIWO_058 NIWO_046 NIWO_062 NIWO_057
## 12 Levels: NIWO_040 NIWO_041 NIWO_046 NIWO_047 NIWO_051 NIWO_057 ... NIWO_067
Answer:The uniuqe function returns a vector, data frame or array with dupliance elements/rows removed. The summary is a generic function which produces summaries of the various functions.
ggplot(Litter.data, aes(x = functionalGroup)) + geom_bar()
geom_boxplot and geom_violin, create
a boxplot and a violin plot of dryMass by functionalGroup.ggplot(Litter.data) + geom_boxplot(aes(x = dryMass, y = functionalGroup))
ggplot(Litter.data) + geom_violin(aes(x = dryMass, y = functionalGroup), draw_quantiles = c(0.25,
0.5, 0.75))
## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
## collapsing to unique 'x' values
## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
## collapsing to unique 'x' values
## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
## collapsing to unique 'x' values
Why is the boxplot a more effective visualization option than the violin plot in this case?
Answer:The boxplot allows you to see the were the majority of the groupings are for the functional groups by drymass but also the outliners. The violin plots which display density distribution are not as effective.
What type(s) of litter tend to have the highest biomass at these sites?
Answer:Needles have the highest biopass at these sights.